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Abstract. This final report consists of two main parts. The first is taken from a 
paper by the PiCA Group which describes our ongoing work in fast computation 
of n-point correlation functions, (see Moore et al. 2001). 

We present here a new algorithm for the fast computation of N-point correlation 
functions in large astronomical data sets. The algorithm is based on fcd-trees which 
are decorated with cached sufficient statistics thus allowing for orders of magnitude 
speed-ups over the naive non-tree-based implementation of correlation functions. 
We further discuss the use of controlled approximations within the computation 
which allows for further acceleration. In summary, our algorithm now makes it pos- 
sible to compute exact, all-pairs, measurements of the 2, 3 and 4-point correlation 
functions for cosmological data sets like the Sloan Digital Sky Survey (SDSS, York 
et al. 2000) and the next generation of Cosmic Microwave Background experiments 
(see Szapudi et al. 2000). 

The second part summarizes the progress made by the PiCA Group in this area 
through the AISR grant. 


1 Introduction 

Correlation functions are some of the most widely used statistics within as- 
trophysics (see Peebles 1980 for a extensive review). They are often used to 
quantify the clustering of objects in the universe ( e.g . galaxies, quasars etc.) 
compared to a pure Poission process. More recently, they have also been used 
to measure fluctuations in the Cosmic Microwave Background (see Szapudi 
et al. 2000). On large scales, the higher-order correlation functions (3-point 
and above) can be used to test several fundamental assumptions about the 
universe; for example, our hierarchical scenario for structure formation, the 
Gaussianity of the initial conditions as well as testing various models for the 
biasing between the luminous and dark matter. The reader is referred to Sza- 
pudi (2000), Szapudi et al. (1999a, b) and Scoccimarro (2000; and references 
therein) for an overview of the usefulness of iV-point correlation functions in 
constraining cosmological models. 

Over the coming decade, several new, massive cosmological surveys will 
become available to the astronomical community. In this new era, the qual- 
ity and quantity of data will warrant a more sophisticated analysis of the 
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higher-order correlation functions of galaxies (and other objects) over the 
largest range of scales possible. Our ability to perform such studies will be 
severely limited by the computational time needed to compute such functions 
and no-longer by the amount of data available. In this paper, we address this 
computational “bottle-neck” by outlining a new algorithm that uses innova- 
tive computer science to accelerate the computation of iV-point correlation 
functions far beyond the naive 0(R N ) scaling law (where R is the number of 
objects in the dataset and N is the power of correlation function desired). 

The algorithm presented here was developed as part of the “Computa- 
tional AstroStatistics” collaboration (see Nichol et al. 2000) and is a member 
of a family of algorithms for a very general class of statistical computations, 
including nearest-neighbor methods, kernel density estimation, and cluster- 
ing. The work presented here was initially presented by Gray k Moore (2001) 
and will soon be discussed in a more substantial paper by Connolly et al. 
(2001). In this conference proceeding, we provide a brief review of Ad-trees 
(Section 2), a discussion of the use of Ad- trees in range searches (Section 
3), an overview of the development of a fast iV-point correlation function 
code (Section 4) as well as presenting the concept of controlled approxima- 
tions in the calculation of the correlation function (Section 5). In Section 6, 
we provide preliminary results on the computation speed-up achieved with 
this algorithm and discuss future prospects for further advances in this field 
through the use of other tree structures. 


2 Review of Ad- trees 

Our fast iV-point correlation function algorithm is built upon the Ad-tree 
data structure which was introduced by Friedman et al. (1977). A Ad-tree 
is a way of organizing a set of datapoints in A-dimensional space in such a 
way that once built, whenever a query arrives requesting a list all points in 
a neighborhood, the query can be answered quickly without needing to scan 
every single point. 

The root node of the Ad-tree owns all the data points. Each, non-leaf-node 
has two children, defined by a splitting dimension ti.SplitDim and a splitting 
value ti.SplitValue. The two children divide their parent’s data points between 
them, with the left child owning those data points that are strictly less than 
the splitting value in the splitting dimension, and the right child owning the 
remainder of the parent’s data points: 

€ 71. Left X^[?1.SplitDim] < ti.SplitValue and Xt €n (1) 

Xj € 71 . Right O Xffn.SpLiTDiM] > Ti.SplitValue and X* e n (2) 

As an example, some of the nodes of a fed-tree are illustrated in Figures 1. 

fed-trees cure usually constructed top-down, beginning with the full set of 
points and then splitting in the center of the widest dimension. This produces 
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Figure lc: The third level contains four ■ 

nodes. Note how a parent node creates its Figure Id: The set of nodes in the sixth level 
two children by splitting in the centers of its of the tree, 
widest dimension 


two child nodes, each with a distinct set of points. This procedure is then 
repeated recursively on each of the two child nodes. 

A node is declared to be a leaf, and is left unsplit, if the widest dimension 
of its bounding box is < some threshold, MinBoxWidth. A node is also left 
unsplit if it denotes fewer than some threshold number of points, r m i n . A leaf 
node has no children, but instead contains a list of A;-dimensional vectors: 
the actual datapoints contained in that leaf. The values MinBoxWidth = 0 
and r m in = 1 would cause the largest ta-tree structure because all leaf nodes 
would denote singleton or coincident points. In practice, we set MinBoxWidth 


4 


Jeff Schneider and Andy Connolly 


to 1% of the range of the data point components and r mm to around 10. The 
tree size and construction thus cost considerably less than these bounds be- 
cause in dense regions, tiny leaf nodes are able to summarize dozens of data 
points. The operations needed in tree-building are computationally trivial 
and therefore, the overhead in constructing the tree is negligible. Also, once 
a tree is built it can be re-used for many different analysis operations. 

Since the introduction of Ad-trees, many variations of them have been 
proposed and used with great success in areas such as databases and com- 
putational geometry (Preparata & Shamos 1985). R-trees (Guttman 1984) 
are designed for disk resident data sets and efficient incremental addition of 
data. Metric trees (see Uhlmann 1991) place hyperspheres around tree nodes, 
instead of axis-aligned splitting planes. In all cases, the algorithms we discuss 
in this paper could be applied equally effectively with these other structures. 
For example, Moore (2000) shows the use of metric trees for accelerating 
several clustering and pairwise comparision algorithms. 


3 Range Searching 

Before proceeding to fast 7V-point calculations, we will begin with a very 
standard Ad-tree search algorithm that could be used as a building block for 
fast 2-point computations. 

For simplicity of exposition we will assume the every node of the Ad-tree 
contains one extra piece of information: the bounding box of all the points 
it contains. Call this box u.BoundBox. The implication of this is that every 
node must contain two new k dimensional vectors to represent the lower 
and upper limits of each dimension of the bounding box. The range search 
operation takes two inputs. The first is a Ac-dimensional vector q called the 
query point The second is a separation distance s*,. The operation returns 
the complete set of points in the Ad-tree that lie within distance Shi of q. 

• RangeSearch(n, q, Shi) 

Returns a set of points S such that 

x e S x € n and Q| — Shi (3) 

• Let MinDist := the closest distance from q to ti.boundBox. 

• If MinDist > s^i then it is impossible that any point in n can be within 
range of the query. So simply return the empty set of points without 
doing any further work. 

• Else, if n is a leaf node, we must iterate through all the datapoints in its 
leaf list. For each point, find if it is within distance Shi of q. If so, add it 
to the set of results. 

• Else, n is not a leaf node. Then: 

— Let Sieft := RangeSearch(n.LEFT,q, Shi) 

— Let S r ight '.= RangeSearch (n. Right, q, 5/jj) 
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- Return 5i e ft U S r i g ht- 

Figure 2a shows the result of running this algorithm in two dimensions. 
Many large nodes are pruned from the search. 117 distance calculations were 
needed for performing this range search, compared with 499 that would have 
been needed by a naive method. 

Note that it is not essential that Ad-tree nodes have bounding boxes ex- 
plicitly stored. Instead a hyper-rectangle can be passed to each recursive call 
of the above function and dynamically updated as the tree is searched. 

Range searching with a Axi-tree can be much faster than without if the 
range is small, containing only a small fraction of the total number of data- 
points. But what if the range is large? Figure 2b shows an example in which 
Ad-trees provide little computational saving because almost all the points 
match the query and thus need to be visited. In general this problem is un- 
avoidable. But in one special case it can be avoided — if we merely want to 
count the number of datapoints in a range instead of explicitly find them all. 

3.1 Range Counting and Cached Sufficient Statistics 

We will add the following field to a Ad-tree node. Let u-NumPoints be the 
number of points contained in node n. This is the first and simplest of a set 
of fcd-tree decorations we refer to as cached sufficient statistics (see Moore 
& Lee 1998). In general, we frequently stored the centroid of all points in a 
node and their covariance matrix. 
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Once we have n.NuMPoiNTs it is trivial to write an operation that counts 
the number of datapoints within some range without explicitly visiting them. 

• RangeCount(n,q, s*,) 

Returns an integer: the number of points that are both inside the n and 
also within distance s^ of q. 

• Let MinDist := the closest distance from q to h-BoundBox. 

• If MinDist > then it is impossible that any point in n can be within 
range of the query. So simply return 0. 

• Let MaxDist := the furthest distance from q to ti.BoundBox. 

• If MaxDist < sm then every point in n must be within range of the 
query. So simply return n.NuMPoiNTs. 

• Else, if n is a leaf node, we must iterate through all the datapoints in 
its leaf list. Start a counter at zero. For each point, find if it is within 
distance Shi of q. If so, increment the counter by one. Return the count 
once the full list has been scanned. 

• Else, n is not a leaf node. Then: 

- Let Cleft “ RangeCount(n.LEFT, guery, sm) 

— Let C r ight := RangeCount(n. Right, query, Shi) 

Return Cjeft d - C r jght- 

The same query that gave the poor range search performance in Figure 
2b gives good performance in Figure 3. The difference is that a second type 
of pruning of the search is possible: if the hyperrectangle surrounding the n 
is either entirely outside or inside the range then we prune. 

4 Fast iV-point Correlation Functions 

4.1 The Single Tree Approach to Two- Point Computation 

It is easy to see that the 2-point correlation function is simply a repeated set 
of range counts. For example, given a minimum and maximum separation s/ 0 
and Shi we run the following algorithm: 

• S ingleTree2 Point (X, n, si 0 , s^) 

Input X is a dataset, represented as a matrix in which the fcth row corre- 
sponds to the fcth datapoint. X has R rows and fc columns. Input n is the 
root of a kdtree built from the data in X. Output integer: the number of 
pairs of points (x*,Xj) such that si Q < |x* — Xj| < . 

• C := 0 

• For i between 1 and R do: 

- C := C + RangeCount(n,Xj, s^) — RangeCount(n,Xi, s/ 0 ) 

Note that in practice we do not use two range counts at each iteration, but 
one slightly more complex rangecount operation 

RangeCountBetweenSeparations(n, q, si 0 , Shi) 

that directly counts the number of points whose distance from q is between 
si 0 and 
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Fig. 3. When doing 
a range count, nodes 
entirely within range 
can also be pruned 
and added to the 
total count. This ad- 
ditional pruning adds 
significant speed-ups 
to the slower range 
count discussed in 
Figure 2b. Now one 
just spends time 
studying nodes on 
the boundary of the 
range count. 


4.2 The Dual Tree Approach to Two-Point Computation 

The previous algorithm iterates over all datapoints, issuing a range count 
operation for each. We can save further time by turning that outer iteration 
into an additional kd-tree search. The new search will be a recursive procedure 
that takes two nodes, n a and n b , as arguments. The goal will be to compute 
the number of pairs of points (x,y) such that x E n a , y E n b , and s lo < 

|x — y| < s hi . 

• DualTreeCount (n a , n b , $r 0 , $/») 

Returns an integer: the number of pairs of points (x,y) such that x E n a , 
y E n b , and sj 0 < |x — y| < sm. 

• Let MinDist := the closest distance between n a .BouNDBox and 7i b .boundBox. 

• If MinDist > Shi then it is impossible that any pair of points can match. 

So simply return 0. 

• Let MaxDist := the furthest distance between n a . B oundBox and n b .BouNDBox. 

• If MaxDist < si D then it is again impossible that any pair of points can 
match. So simply return 0. 

• If sio < MinDist < MaxDist < s hi then all pairs of points must match. 
Use n a . num Points and 7 i b .NuM P oints to compute the number of resulting 
pairs ?i a .N um Points x 7 i b .NuMPoiNTs, and return that value. 

• Else, if n a and n b are both leaf nodes, we must iterate through all pairs 
of datapoints in their leaf lists. Return the resulting (slowly computed) 
count. 
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• Else at least one of the two nodes is a non-leaf. Pick the non-leaf with the 
largest number of points (breaking ties arbitrarily), and call it n*. Call 
the other node n~. Then: 

- Let C] e ft := DualTreeCount(n\LEFT,n“, sm) 

- Let C r ight := DualTreeCount(n*. Right, n _ , s/ 0 , Shi) 

Return C|gft ~b bright * 


Computing a 2-point function on a dataset X then simply consists of com- 
puting the value C = DualTreeCount(n root ,n TOoU si a , s^), where n roo t is 
a kd-tree built from X, for a range of bins with minimum and maximum 
boundaries of st Q and hisep . We note here that the 2-point correlation func- 
tion, the quanity of interest is not simply C, but C/2 (the number of unique 
pairs of objects). 

A further speed-up can be obtained by simultaneously computing the 
DualTreeCount(n root , n roo t> 5/ 0 , s^) over a series of bins. We will discuss 
this in further detail in Connolly et al. (2001). 


4.3 Redundancy Elimination 

So far, we have discussed two operations - exclusion and subsumption - which 
remove the need to traverse the whole tree thus speeding-up the computation 
of the correlation function. Another form of pruning is to eliminate node- node 
comparisons which have been performed already in the reverse order. This 
can be done simply by (virtually) ranking the datapoints according to their 
position in a depth-first traversal of the tree, then recording for each node the 
minimum and maximum ranks of the points it owns, and pruning whenever 
n a ’s maximum rank is less than rib’s minimum rank. This is useful for all- 
pairs problems, but will later be seen to be essential for all-k-tuples problems. 
This kind of pruning is not practical for Single-tree search. 


4.4 Multiple Trees Approach to iV-Point Computation 

The advantages of Dual-Tree over Single-Tree are so far two fold. First, Dual- 
tree can be faster, and second it can exploit redundancy elimination. But two 
more advantages remain. First, we can extend the “2-tree for 2-point” method 
up to “N-trees for N-point”. Second (discussed in Section 5.1), we can perform 
effective approximation with Dual-trees (or n-trees). We now discuss the first 
of these advantages. 

The iV-point computation is parameterized by two nxn symmetric ma- 
trices: L and H . We wish to compute 

R R R 

£ £ • ...*„) 

* 1=1 *2 = 1 in — 1 
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where /(L, H,iu h, - - - in) is zero unless the following conditions hold (in 
which case it takes the value 1): 

VI < i < j < ft* ^ | < (^) 

We will achieve this by calling a recursive function FastNPoint on an n-tuple 
of kdtree nodes (m ,n 2 ...n n ). This recursive function much return 

X X E I(L,H,i u i 2 ,...i n ) (6) 

ii€nii2€n2 tn€«n 


• Fast NPoint(n a , rib, Sm) 

• Let AllSubsumed :=TRUE 

• For i — 1 to n do 

— For j = i + 1 to n do 

* Let Min D 1ST := the closest distance between n*. BoundBox and 
71j .BoundBox. 

* If MinDist > then it is impossible that any n-tuple of 

points can match because the distance between the ith and jth 
points in any such n-tuple must be out of range. So simply return 
°. 

* Let MaxDist := the furthest distance between n\. BoundBox and 
7lj .BoundBox. 

* If MaxDist < L^j) then similarly return 0. 

* If L {i)j ) < MinDist < MaxDist < then every n-tuple 

has the property the the ith member and jth member match. 
We are interested in whether this is true for all (i, j) pairs and 
so the first time we are disappointed (by discovering the above 
expression does not hold) then we will update the AllSubsumed 
flag. Thus the actual computation at this step is: 

If L( i?j ) > MinDist or MaxDist > H^j) then 
AllSubsumed :=FALSE. 

• If AllSubsumed has remained true throughout the above double loop, 
we can be sure that every n-tuple derived from the nodes in the recursive 
call must match, and so we can simply return 

n 

Tlj-NUMPOINTS (7) 

«=1 

• Else, if all of ni,n 2 , . . .n n are leaf nodes we must iterate through all 
n-tuples of datapoints in their leaf lists. Return the resulting (slowly 
computed) count. 

• Else at least one of the nodes is a non leaf. Pick the non-leaf with the 
largest number of points (breaking ties arbitrarily), and assume it has 
index i = i* . Then: 


10 


Jeff Schneider and Andy Connolly 


— Let Cjeft :=z Feist NPoint(ni , . . . , n t * .Left, . . . , ra n ) 

- Let Cright •= FastNPoint(ni , . . . , n? .right, . . . , n n ) 

Return Cleft d- Cright • 

The full iV-point computation is achieved by calling FastNPoint with 
arguments consisting of an n-tuple of copies of the root node. 

We should note once again it is possible to save considerable amounts of 
computation by eliminating redundancy. For example, in the 4-point statis- 
tic, the above implementation will recount each matching 4-tuple of points 
(x,y,z,w) in 24 different ways: once for each of the 4! permutations of 
( x , y , z, w). Again, this excess cost can be avoided by ordering the datapoints 
via a depth-first tree indexing scheme and then pruning any n-tuple of nodes 
violating that order. But the reader should be aware of an extremely messy 
problem regarding how much to award to the count in the case that a sub- 
sume type of pruning can take place. If all nodes own independent sets of 
points the answer is simple: the product of the node counts. If all nodes are 
the same then the answer is again simple: (j^), where n is the number of 
points in the node. Somewhat more subtle combinatorics are needed in the 
case where some nodes in the n-tuple are identical and others are not. And 
fearsome computation is needed in the various cases in which some nodes are 
descendants of some other nodes. 

5 Controlled Approximation 

In general, when the final answer comes back from FastNPoint, the ma- 
jority of the quantity in the count will be the sum of components arising 
from large subsume prunes. But the majority of the computational effort will 
have been spent on accounting for the vast number of small but unprunable 
combinations of nodes. We can improve the running time of the algorithm by 
demanding that it also prunes it search in cases in which only a tiny count 
of n-tuples is at stake. This is achieved by adding a parameter, T, to the 
FastNPoint algorithm, and adding the following lines at the start: 

• Let C max : = Or=l ri i'NUMPO!NT s 

• If C max < T then quit this recursive call. 

This will clearly cause an inaccurate result, but fortunately it is not hard to 
maintain tight lower and upper bounds on what the true answer would have 
been if the approximation had not been made. Thus FastNPoint (ni , ri 2 , . . . , n n , 
now returns a pair of counts (Ci 0 , Chi) where we can guarantee that the true 
count C lies in the range Ci Q < C < Chi- 


5.1 Iterative Deepening for Controlled Approximation 

Suppose the true value of the 7V-point function is C but that we are prepared 
to accept a fractional error of e: we will be happy with any value C ap prox such 
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Fig. 4. The com- 
putational time of 
our algorithm versus 
the size of dataset. 
The crosses are real 
2-dimensional pro- 
jected galaxy data 
while the dots are 
just drawn from a 
Poission distribution. 
The theoretically 
expected scaling law 
of nVn is shown 
and agrees well with 
the observed data. 
The naive N 2 law 
is also plotted for 
comparison 


that 

|C a p prox - C\ < eC (8) 

It is possible to adapt the n-tree algorithm using a best-first iterative deep- 
ening search strategy to guarantee this result while exploiting permission 
to approximate effectively by building the count as much as possible from 
“easy-win” node pairs while doing approximation at hard deep node-pairs. 
This is simply achieved by repeatedly calling the previous approximate algo- 
rithm with diminishing values of T until a value is discovered that satisfies 
Equation 8. 

6 Discussion 

We plan to present a more detailed discussion of the techniques presented 
here in a forthcoming paper (Connolly et al. 2001). That paper will also in- 
clude a full analysis of the computational speed and overhead of our iV-point 
correlation function algorithm and compare those with existing software for 
computing the higher-order correlation functions e.g. Szapudi et al. 1999a. 
However, in Figure 4, we present preliminary results on the scaling of com- 
putational timing needed for a 2-point correlation function as a function 
of the number of objects in the data set. For these tests, we computed all 
the data-data pairs for random data sets and real, projected 2-dimensional 
galaxy data. These data show that our 2-point correlation function algorithm 
scales as 0{N y/N) (for projected 2-dimensional data) compared to the naive 
all-pairs scaling of 0{N 2 ) where here N is the size of the dataset under con- 
sideration. To emphasis the speed-up obtained by our algorithm (Figure 4), 
an all-pairs count for a database of 10 7 objects would take only 10 hours (on 
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our DEC Alpha workstation) using our methodology compared to ~ 10,000 
hours (> 1 year) using the naive N 1 2 method. Clearly, binning the data would 
also drastically increase the speed of analyses over the naive all-pairs 0(N 2 ) 
scaling but at the price of lossing of resolution. 

Similar spectacular speed-ups will be achieved for the 3 and 4-point func- 
tions and we will report these results elsewhere (Connolly et al. 2001). Fur- 
thermore, controlled approximations can further accelerate the computations 
by several orders of magnitude. Such speed-ups are vital to allow Monte Carlo 
estimates of the errors on these measurements. In summary, our algorithm 
now makes it possible to compute an exact, all-pairs, measurement of the 2, 
3 and 4-point correlation functions for data sets like the Sloan Digital Sky 
Survey (SDSS). These algorithms will also help in the speed-up of Cosmic 
Microwave Background analyses as outlined in Szapudi et al. (2000). 

Finally, we note here that we have only touched upon one aspect of how 
trees data structures (and other computer science techniques) can help in 
the analysis of large astrophysical data sets. Moreover, there are other tree 
structures beyond Ad-trees such as ball trees which could be used to optimize 
our correlation function codes for higher dimensionality data. We will explore 
these issues in future papers. 


7 Further Progress 

Our progress in fast n-point correlation functions has centered in the following 
two areas: 

7.1 Marked and weighted n-point correlation 

A new algorithm was developed and implemented in the n-point software to 
do weighted n-point correlations. See Biesbart et al. (2002) for a description 
of marked and weighted n-point correlation functions and their use in astro- 
physics. In particular, it is possible to compute the equivalent of a marked 
n-point correlation function in many cases through the use of a weighted 
n-point computation. 

The new algorithm will be described in an upcoming journal paper on fast 
computation of n-point correlation functions. This paper will also present our 
latest results of running weighted n-point correlations on large data sets. The 
new algorithm hinges on two specific advances: 

1. Unique counting. The algorithm was changed so that multiple matches 
to the same template by the same set of points were only counted once. 

For example, the same pair of points could be counted as matching the 
same template twice - once for each mapping between the actual data 
points and the template points. Similarly, in a 3-point computation, a 
template might be matched 6 times by the same triple, or twice, or just 
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once. In the naive, exhaustive algorithm, making the counting unique is 
trivial, but extra care is required during cutoffs in the fast algorithm. 

2. Weighting. Again, the naive algorithm can trivially add weights to its 
computation. The key to the fast algorithm is an efficient way to store 
extra cached information at each node of the kd-tree and the method by 
which that information can be used to get an exact, weighted, answer 
when a cutoff is made. 


7.2 Software availability 

Through the efforts of this grant, all of the software described in this docu- 
ment has been tested, documented, and made publicly available to scientists. 
To download the software, go to www.cs.cmu.edu/~AUTON and click on 
the link for “Astrostatistics Software.” From this page, click on the link to 
download the NPT software. The software is written in plain C and the doc- 
umented source code can be downloaded. On this page, you will also notice 
additional software packages made available to the astrophysics community 
through the continuing efforts of the PiCA Group. 
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